Learning Automata based Algorithms for Finding Optimal Policies in Fully Cooperative Markov Games

نویسندگان

  • Behrooz MASOUMI
  • Mohammad Reza MEYBODI
  • Farnaz ABTAHI
چکیده

Markov games, as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi-agent systems. In this paper, several learning automata based multi-agent system algorithms for finding optimal policies in fully-cooperative Markov Games are proposed. In the proposed algorithms, Markov problem is described as a directed graph in which the nodes are the states of the problem, and the directed edges represent the actions that result in transition from one state to another. Each state of the environment is equipped with a variable structure learning automata whose actions are moving to different adjacent states of that state. Each agent moves from one state to another and tries to reach the goal state. In each state, the agent chooses its next transition with help of the learning automaton in that state. The actions taken by learning automata along the path traveled by the agent is then rewarded or penalized based on the value of the traveled path according to a learning algorithm. In the second group of the proposed algorithms, the concept of entropy has been imported into learning automata based multi-agent systems to drive the magnitude of the reinforcement signal given to the LA and improve the performance of the algorithms. The results of experiments have shown that the proposed algorithms perform better than the existing learning automata based algorithms in terms of speed and the accuracy of reaching the optimal policy. Streszczenie. Zaprezentowano szereg automatów uczących bazujących na algorytmach systemów typu multi-agent w celu poszukiwania optymalnej polityki w kooperatywnej grze Markova. Proces Markova jest opisany w postaci grafów których węzły opisują stan problemu, a krawędzie reprezentują akcje. (Automat uczący bazujący na algorytmie znajdowania optymalnej strategii w kooperacyjne grze Markova)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Learning Automata Based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games

Markov games, as the generalization of Markov decision processes to the multi-agent case, have long been used for modeling multi-agent systems (MAS). The Markov game view of MAS is considered as a sequence of games having to be played by multiple players while each game belongs to a different state of the environment. In this paper, several learning automata based multiagent system algorithms f...

متن کامل

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Learning automata (LA) were recently shown to be valuable tools for designing Multi-Agent Reinforcement Learning algorithms and are able to control the stochastic games. In this paper, the concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi-agent systems and spe...

متن کامل

An Analysis of non-Markov Automata Games: Implications for Reinforcement Learning

It has previously been established that for Markov learning automata games, the game equilibria are exactly the optimal strategies (Witten, 1977; Wheeler & Narendra, 1986). In this paper, we extend the game theoretic view of reinforcement learning to consider the implications for \group rationality" (Wheeler & Narendra, 1986) in the more general situation of learning when the Markov property ca...

متن کامل

Correlated Q-Learning

Recently, there have been several attempts to design multiagent learning algorithms that learn equilibrium policies in general-sum Markov games, just as Q-learning learns optimal policies in Markov decision processes. This paper introduces correlated-Q learning, one such algorithm. The contributions of this paper are twofold: (i) We show empirically that correlated-Q learns correlated equilibri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012